The dataset is about unemployment data in the State of NY, and it contains monthly information. This project working on forecasting unemployment which helps understand the economic health of a region.
library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2 4.0.0 ✔ fma 2.5
## ✔ forecast 8.24.0 ✔ expsmooth 2.3
##
library(TTR)
raw_data_csv <- read_csv("Data_NewYorkEdited.csv")
## Rows: 564 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (8): Code, Year, Month, NIP, Population, EmployedLF, Unemployment, Unemp...
## num (2): LF, Employment
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(raw_data_csv)
## # A tibble: 6 × 11
## Code State Year Month NIP LF Population Employment EmployedLF
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 51000 New York city 1976 1 5.69e6 3.07e6 53.9 2723016 47.8
## 2 51000 New York city 1976 2 5.69e6 3.07e6 53.9 2722421 47.8
## 3 51000 New York city 1976 3 5.69e6 3.06e6 53.9 2722931 47.9
## 4 51000 New York city 1976 4 5.69e6 3.07e6 53.9 2726299 47.9
## 5 51000 New York city 1976 5 5.69e6 3.07e6 54 2730681 48
## 6 51000 New York city 1976 6 5.68e6 3.08e6 54.1 2735049 48.1
## # ℹ 2 more variables: Unemployment <dbl>, UnemployedLF <dbl>
spec(raw_data_csv)
## cols(
## Code = col_double(),
## State = col_character(),
## Year = col_double(),
## Month = col_double(),
## NIP = col_double(),
## LF = col_number(),
## Population = col_double(),
## Employment = col_number(),
## EmployedLF = col_double(),
## Unemployment = col_double(),
## UnemployedLF = col_double()
## )
ts_unemployed <- ts(raw_data_csv$UnemployedLF, start=c(1976,1), end = c(2019,12), frequency=12)
head(ts_unemployed)
## Jan Feb Mar Apr May Jun
## 1976 11.2 11.2 11.2 11.1 11.1 11.1
tail(ts_unemployed)
## Jul Aug Sep Oct Nov Dec
## 2019 3.8 3.8 3.8 3.8 3.9 4.0
plot(ts_unemployed, ylab = "UnemployedLF", col= "red")
Acf(ts_unemployed)
The time series of UnmploymentLF from Jan 1976 to Dec 2019 shows several key characteristics:
trend: A gradual downward long-term trend, with unemployment levels decreasing toward the end of the series.
Strong seasonality: No clear seasonality is observed. Instead of seasonal repetition, the series exhibits multi-year cyclical movements.
Increasing variability:Variability is not increasing over time; fluctuations remain relatively consistent, though cycles differ in amplitude across decades.
Overall, the series displays pronounced long-term cycles and strong persistence, with slow ACF decay indicating long-range dependence rather than seasonal structure.
To generate short-term forecasts (1–2 years), a rolling window of recent data is preferred because the unemployment rate experienced structural changes over the full 40-year span. The period after 2010 is much more stable, with consistent downward movement and lower volatility. Using this window improves model stability and prevents earlier high-variance periods from distorting short-term forecasts.
unemployed_ts <- window(ts_unemployed, start = c(2010,1))
plot(unemployed_ts, main = "Unemployment Level in Labor Force")
Acf(unemployed_ts)
The ACF shows very strong and persistent autocorrelation that decays slowly, indicating a non-stationary series with a strong underlying trend. There is no visible seasonal pattern, as correlations do not show repeating peaks at regular lags.
# Display summary statistics
summary(unemployed_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 4.500 6.250 6.737 9.100 10.100
# Create a box plot for visualizing distribution
boxplot(unemployed_ts,
main = "Boxplot of Unemployment (2010–2019)",
ylab = "unemployment(Percentage)",
col = "lightblue",
border = "darkblue")
The unemployment rates from 2010–2019 range between 3.8 and 10.1, with a median around 6.25, indicating that half of the values fall between 4.5 and 9.1. The Boxplot shows a wide IQR, suggesting substantial variability across the decade, and no extreme outliers. Overall, the distribution indicates that unemployment was generally moderate but fluctuated significantly, consistent with the downward trend seen in the time series.
For this dataset, MAPE (Mean Absolute Percentage Error) is selected as the primary accuracy metric. Since the goal is to forecast future unemployment rate and communicate performance in interpretable terms, MAPE provides a clear percentage-based measure of average prediction error.
#decomposition
stl_decomp <- stl(unemployed_ts, s.window = "periodic")
plot(stl_decomp)
attributes(stl_decomp)
## $names
## [1] "time.series" "weights" "call" "win" "deg"
## [6] "jump" "inner" "outer"
##
## $class
## [1] "stl"
This additive decomposition shows a very strong and regular seasonal pattern, repeating every year with consitent peaks and troughs.
Seasonal values range about +1 to –1, and High in Jan–Mar, low in Jun–Aug. May caused by early-year layoffs raise unemployment; summer hiring lowers it.
# seasonal adjustment
seasadj(stl_decomp)
## Jan Feb Mar Apr May Jun Jul
## 2010 10.069272 9.970443 9.891614 9.803698 9.725782 9.616041 9.526300
## 2011 8.969272 8.970443 8.891614 8.903698 8.925782 9.016041 9.126300
## 2012 9.469272 9.570443 9.591614 9.603698 9.625782 9.616041 9.526300
## 2013 9.069272 9.070443 8.991614 9.003698 8.925782 8.916041 8.926300
## 2014 7.969272 7.770443 7.591614 7.503698 7.325782 7.116041 6.926300
## 2015 6.169272 6.070443 5.991614 5.903698 5.725782 5.616041 5.426300
## 2016 5.169272 5.170443 5.191614 5.103698 5.125782 5.216041 5.226300
## 2017 4.669272 4.570443 4.491614 4.503698 4.525782 4.616041 4.626300
## 2018 4.269272 4.170443 4.191614 4.103698 4.125782 4.016041 4.026300
## 2019 4.269272 4.170443 4.091614 4.003698 3.925782 3.816041 3.826300
## Aug Sep Oct Nov Dec
## 2010 9.414213 9.422126 9.299997 9.187868 9.072647
## 2011 9.214213 9.322126 9.399997 9.387868 9.472647
## 2012 9.414213 9.322126 9.199997 9.187868 9.172647
## 2013 8.914213 8.722126 8.599997 8.387868 8.172647
## 2014 6.814213 6.622126 6.499997 6.387868 6.272647
## 2015 5.214213 5.222126 5.099997 5.187868 5.172647
## 2016 5.314213 5.222126 5.199997 4.987868 4.872647
## 2017 4.614213 4.522126 4.499997 4.387868 4.272647
## 2018 4.014213 4.122126 4.199997 4.287868 4.272647
## 2019 3.814213 3.822126 3.799997 3.887868 3.972647
# Plot a line on the graph
plot(unemployed_ts)
lines(seasadj(stl_decomp), col="Red")
After removing seasonality, the underlying downward trend becomes clearer, and seasonal fluctuations are moderate but noticeable.
# Model output
forecast_naive <- naive(unemployed_ts,12)
plot(forecast_naive, col= 'red')
nv_residual <- residuals(forecast_naive)
plot(nv_residual, main="Residuals from Naïve Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")
The residuals from the naïve model show clear, repeated patterns rather than random noise, indicating the model is not capturing the underlying structure of the data. This happens because the naïve forecast ignores both trend and seasonality, leading to systematic over- and under-prediction.
hist(nv_residual, main="Histogram of Naïve Forecast Residuals",
xlab="Residuals", col="lightblue", border="white")
The residuals are centered near zero, but the distribution isn’t symmetric.
That indicats the Naïve model doesn’t fit the data perfectly — some structure like seasonality is still left in the residuals.
plot(as.numeric(fitted(forecast_naive)), as.numeric(nv_residual),
main = "Residuals vs Fitted (Naïve Model)",
xlab = "fitted values", ylab = "Residuals")
abline(h = 0, col = "red")
The residuals vs. fitted plot shows clear horizontal banding, meaning the errors cluster at specific levels instead of being randomly scattered. This confirms the naïve model fails to capture the structure of the data and leaves strong patterns in the residuals.
plot(as.numeric(unemployed_ts),as.numeric(nv_residual),
main = "Actual Values vs Residuals (Naïve Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
The actual-vs-residuals plot again shows clear horizontal bands, meaning the residuals take only a few repeated values instead of forming a random cloud. This confirms that the naïve model leaves strong structure unexplained and is not capturing the true dynamics of the series.
Acf(nv_residual, main="ACF of Residuals - Naïve Forecast")
The ACF of the naïve model’s residuals shows strong autocorrelation at many lags, especially early lags, which stay well above the significance bounds. This indicates that the naïve model leaves substantial structure unmodeled, and the residuals are not white noise, confirming poor fit.
accuracy(forecast_naive)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0512605 0.1016668 0.07815126 -0.792328 1.256982 0.1079327
## ACF1
## Training set 0.5610783
forecast_naive$mean
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2020 4 4 4 4 4 4 4 4 4 4 4 4
autoplot(forecast_naive) +
ggtitle("Naïve Forecast") +
ylab("Unemployment(Percentage)") +
xlab("Year") +
theme_minimal()
Accuracy: The naïve model performs poorly — its MAPE is about 1.26, meaning it has large relative errors, and the ACF1 value of 0.56 shows strong autocorrelation in residuals (a bad sign). This indicates the model does not capture trend or seasonality.
What it predicts: The naïve model simply predicts that next month’s unemployment rate will be equal to the current month’s rate, offering no meaningful insight beyond repeating the last observed value.
# Compute moving averages of different orders
MA3_forecast <- ma(unemployed_ts, order = 3)
MA6_forecast <- ma(unemployed_ts, order = 6)
MA9_forecast <- ma(unemployed_ts, order = 9)
# Plot original time series
plot(unemployed_ts, main = "Simple Moving Averages (Orders 3, 6, 9)",
ylab = "Tourist Arrivals (Thousands)", xlab = "Year", col = "black")
# Add the three moving averages
lines(MA3_forecast, col = "red", lwd = 2)
lines(MA6_forecast, col = "blue", lwd = 2)
lines(MA9_forecast, col = "green", lwd = 2)
# Forecast next 4 quarters using order 6 (bonus)
fit_ma6 <- auto.arima(na.omit(MA6_forecast))
forecast_ma6 <- forecast(fit_ma6, h = 12)
lines(forecast_ma6$mean, col = "purple", lwd = 2)
legend("bottomleft",
legend = c("Original", "MA(3)", "MA(6)", "MA(9)", "Forecast MA(6)"),
col = c("black", "red", "blue", "green", "purple"),
lwd = 2,
bty = "n",
cex = 0.6,
inset = 0.02)
Observations: - As the moving average order
increases, the series becomes progressively smoother.
- The MA(3) line (red) still follows most of the
short-term fluctuations.
- The MA(6) line (blue) reduces seasonal noise and
captures medium-term trends better.
- The MA(9) line (green) is the smoothest, showing the
long-term movement but lagging behind actual changes.
- Using MA(6), the forecast for the next next 12 month predicts stable
arrivals with mild variations.
Overall, increasing the order of the moving average reduces short-term noise but delays response to sudden changes in the data.
ss_forecast <- ses(unemployed_ts, h = 12)
plot(ss_forecast)
ss_forecast$model
## Simple exponential smoothing
##
## Call:
## ses(y = unemployed_ts, h = 12)
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 10.1132
##
## sigma: 0.1021
##
## AIC AICc BIC
## 30.87491 31.08180 39.23738
The SES model behaves almost like a naïve model because α is extremely close to 1, meaning it heavily tracks the latest value and does not smooth much of the past. This makes SES less suitable for your data, which has a clear long-term downward trend and seasonal structure.
ss_residual <- residuals(ss_forecast)
plot(ss_residual, main="Residuals from Simple Smoothing Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")
The SES residuals show clear patterns rather than random noise, indicating the model fails to capture important structure in the data. The residuals also exhibit clustered positive and negative runs, confirming SES is not fitting the trend or seasonality well.
hist(ss_residual, main = "Histogram of SS model Residual", xlab="Residuals", col="lightblue", border="white")
Observations: - The SES residual histogram is roughly centered around zero, showing no clear systematic bias. - However, the distribution is skewed and not symmetric, confirming that SES does not fully capture the structure in the data.
plot(as.numeric(fitted(ss_forecast)), as.numeric(ss_residual),
main="Fitted Values vs Residuals (SS Model)",
xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
Observations: - The residuals vs. fitted plot shows that the points form clear horizontal bands rather than a random cloud. - This indicates systematic patterns remain, meaning the SES model still fails to capture important structure in the data.
plot(as.numeric(unemployed_ts), as.numeric(ss_residual),
main = "Actual Values vs Residuals (SS Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
The residuals vs. actual values plot shows clear horizontal clustering, meaning the SES model produces repeated residual patterns rather than random noise. This indicates the model still fails to capture key structure in the data, so the fit remains inadequate.
Acf(ss_residual, main="ACF of Residuals - SS Forecast")
The ACF plot shows strong positive autocorrelation at many early lags, meaning the residuals are not behaving like white noise. This confirms that the SES model misses important structure in the data and is not an adequate forecasting model.
accuracy(ss_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.05094848 0.1012563 0.07761731 -0.7868965 1.247713 0.1071953
## ACF1
## Training set 0.5583315
ss_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2020 3.99999 3.869130 4.130850 3.799857 4.200123
## Feb 2020 3.99999 3.814935 4.185045 3.716973 4.283007
## Mar 2020 3.99999 3.773349 4.226631 3.653372 4.346608
## Apr 2020 3.99999 3.738289 4.261691 3.599753 4.400227
## May 2020 3.99999 3.707401 4.292579 3.552514 4.447466
## Jun 2020 3.99999 3.679476 4.320504 3.509806 4.490174
## Jul 2020 3.99999 3.653796 4.346184 3.470532 4.529448
## Aug 2020 3.99999 3.629894 4.370086 3.433977 4.566003
## Sep 2020 3.99999 3.607444 4.392536 3.399643 4.600337
## Oct 2020 3.99999 3.586211 4.413769 3.367169 4.632811
## Nov 2020 3.99999 3.566015 4.433965 3.336283 4.663697
## Dec 2020 3.99999 3.546718 4.453262 3.306771 4.693209
plot(ss_forecast)
hw_forecast <- hw(unemployed_ts)
forecast_hw <- forecast(hw_forecast, h = 12)
forecast_hw$model
## Holt-Winters' additive method
##
## Call:
## hw(y = unemployed_ts)
##
## Smoothing parameters:
## alpha = 0.7758
## beta = 0.164
## gamma = 1e-04
##
## Initial states:
## l = 10.258
## b = -0.109
## s = 0.0462 0.0217 0.0216 0.008 0.0077 -0.0113
## -0.0277 -0.0421 -0.0215 -0.0208 0.0066 0.0116
##
## sigma: 0.0898
##
## AIC AICc BIC
## 12.94893 18.94893 60.33629
hw_residual <- residuals(forecast_hw)
#plot residual
plot(hw_residual, main = "Residuals from Holt Winter Model", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")
#histogram
hist(hw_residual, main = "Histogram of Holt Winter model Residual", xlab="Residuals", col="lightblue", border="white")
The residual histogram is approximately centered near zero with mild right skewness, confirming that the Holt–Winters model fits the data well overall but slightly underestimates a few low seasonal points.
# fitted fitted vs residuals
plot(as.numeric(fitted(forecast_hw)), as.numeric(hw_residual),
main="Fitted vs Residuals (Holt Winter Model)",
xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
Residuals are mostly concentrated within ±0.2, indicating that the model provides a close fit to the observed data. Only a few larger residuals appear when fitted values fall between 5 and 6, but no clear systematic pattern is present. This random scatter around zero suggests that the Holt–Winters model captures the main trend and seasonality effectively, with only minor localized errors.
# real values vs residuals
plot(as.numeric(unemployed_ts), as.numeric(hw_residual),
main = "Actual Values vs Residuals (Holt Winter Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
Most points fall within ±0.2 residuals, indicating the model fits the data well. There are only a few (≈2) larger residuals when actual values are in the 5–6 range, but these are exceptions. Overall, the Holt–Winters model captures the bulk of the trend and seasonal structure with only minor localized errors.
#ACF plot of residuals
Acf(hw_residual, main="ACF of Residuals - Holt Winter Forecast")
ACF of Residuals
accuracy(forecast_hw)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.006047645 0.08362116 0.06870748 0.1403796 1.187193 0.09489012
## ACF1
## Training set 0.5620648
forecast_hw
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2020 3.949446 3.834332 4.064559 3.773395 4.125497
## Feb 2020 3.954453 3.796479 4.112427 3.712853 4.196053
## Mar 2020 3.937111 3.734376 4.139846 3.627054 4.247167
## Apr 2020 3.946437 3.696635 4.196239 3.564398 4.328476
## May 2020 3.935865 3.636587 4.235143 3.478159 4.393571
## Jun 2020 3.960216 3.609057 4.311375 3.423165 4.497267
## Jul 2020 3.986585 3.581183 4.391986 3.366577 4.606592
## Aug 2020 4.015636 3.553691 4.477580 3.309153 4.722119
## Sep 2020 4.025927 3.505203 4.546650 3.229549 4.822304
## Oct 2020 4.049626 3.467954 4.631299 3.160035 4.939218
## Nov 2020 4.059753 3.415023 4.704483 3.073724 5.045782
## Dec 2020 4.094166 3.384331 4.804001 3.008568 5.179765
plot(forecast_hw)
Model Accuracy The Holt–Winters additive model provides strong accuracy, with a very low MAPE of about 1.19%, indicating that forecast errors are small on average and the model captures both trend and seasonality well.
Forecast for the Next Year It predicts that unemployment will remain stable over the next year, with point forecasts mostly between 3.94 and 4.05 throughout 2020, showing a slight upward drift but no major changes.
Overall, the model fits the data well, handles seasonal patterns effectively, and produces smooth, realistic forecasts with narrow confidence intervals.
auto_fit <- auto.arima(unemployed_ts)
attributes(auto_fit)
## $names
## [1] "coef" "sigma2" "var.coef" "mask" "loglik" "aic"
## [7] "arma" "residuals" "call" "series" "code" "n.cond"
## [13] "nobs" "model" "xreg" "bic" "aicc" "x"
## [19] "fitted"
##
## $class
## [1] "forecast_ARIMA" "ARIMA" "Arima"
forecast_arima = forecast(auto_fit,h = 12,level=c(99.5))
forecast_arima$model
## Series: unemployed_ts
## ARIMA(3,1,0)(2,0,0)[12] with drift
##
## Coefficients:
## ar1 ar2 ar3 sar1 sar2 drift
## 0.3529 0.5959 -0.2016 -0.0679 -0.3012 -0.0507
## s.e. 0.0914 0.0792 0.0927 0.0951 0.0919 0.0158
##
## sigma^2 = 0.003579: log likelihood = 167.65
## AIC=-321.3 AICc=-320.29 BIC=-301.85
The automatic ARIMA procedure selected an ARIMA(3,1,0)(2,0,0)12 with drift, meaning the model captures short-term AR structure, a seasonal AR component, and an overall downward trend through differencing and drift. The estimated parameters (ar1 = 0.3529, ar2 = 0.5959, ar3 = –0.2016, sar1 = –0.0679, sar2 = –0.0312) and the small estimated variance (σ² ≈ 0.0036) indicate a stable model fit.
Overall, the model is well-specified for capturing both trend and annual seasonality, and it provides a strong statistical foundation for producing year-ahead forecasts.
arima_residual <- residuals(forecast_arima)
plot(arima_residual, main = "Residual of ARIMA", xlab = "Year")
abline(h=0, col="red")
Your residuals fluctuate around 0, which indicates that the model does not systematically over- or under-predict. Residuals look like white noise: no visible trend or seasonality. There are a few big residual spikes, especially around late 2013 and late 2016, possible unexpected events there.
hist(arima_residual, main = "Histogram of ARIMA model Residual", xlab="Residuals", col="lightblue", border="white")
The shape is roughly centered around zero The distribution is fairly symmetric, although not perfectly. No extremely heavy tails or skewness.
plot(as.numeric(fitted(forecast_arima)), as.numeric(arima_residual),
main = "Fitted and Residuals of ARIMA Model",
xlab = "Fitted Values",
ylab = "Residuals"
)
abline(h=0, col= "red")
The residuals appear randomly scattered around zero with no clear pattern, suggesting the ARIMA model captures most of the underlying structure in the data. > However, slight clustering at higher fitted values indicates some remaining seasonality or nonlinear effects that the ARIMA model does not fully explain.
plot(as.numeric(unemployed_ts), as.numeric(arima_residual),
main = "Real Value vs Residuals of ARIMA",
xlab = "Real Values",
ylab = "Residuals"
)
abline(h = 0, col= "red")
The residuals show no clear pattern across the range of actual unemployment values, suggesting the ARIMA model does not systematically over- or under-predict at specific levels. A small cluster of points at higher actual values hints at slight remaining structure, but overall the residuals behave reasonably randomly around zero.
Acf(arima_residual)
All spikes fall inside the bands, no significant remaining autocorrelation. No repeating seasonal pattern and no obvous structure So overall, the ACF suggests that the ARIMA model has done a reasonable job: the residuals look mostly random, meaning the model has captured the trend and short-term correlation in the data.
accuracy(forecast_arima)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005022391 0.05805211 0.04700228 0.04148005 0.7929878 0.06491364
## ACF1
## Training set -0.02333479
forecast_arima
## Point Forecast Lo 99.5 Hi 99.5
## Jan 2020 4.051437 3.883511 4.219362
## Feb 2020 4.115145 3.832638 4.397651
## Mar 2020 4.130416 3.682063 4.578770
## Apr 2020 4.158380 3.562627 4.754134
## May 2020 4.144127 3.386519 4.901735
## Jun 2020 4.149031 3.243134 5.054928
## Jul 2020 4.109461 3.053290 5.165632
## Aug 2020 4.063127 2.867138 5.259116
## Sep 2020 3.981989 2.648658 5.315321
## Oct 2020 3.896693 2.434237 5.359150
## Nov 2020 3.801711 2.214022 5.389400
## Dec 2020 3.734294 2.027927 5.440660
plot(forecast_arima, main = "Unemployment Next Year from ARIMA")
forecast_arima_N2 = forecast(auto_fit,h = 24,level=c(99.5))
plot(forecast_arima_N2, main = "Unemployment Next Two Year from ARIMA")
ARIMA achieves strong overall accuracy with low error rates (MAPE 0.79%), no bias, and better performance than the naive benchmark, indicating a well-fitted model.
# Create accuracy comparison table
model_comparison <- rbind(
naive = accuracy(forecast_naive),
ss = accuracy(ss_forecast),
ma = accuracy(forecast_ma6),
hw = accuracy(forecast_hw),
arima = accuracy(forecast_arima)
)
rownames(model_comparison) <- c( "Naive", "SES", "Moving Average","Holt Winter", "ARIMA")
round(model_comparison[, c("RMSE","MAE","MAPE","MASE")], 3)
## RMSE MAE MAPE MASE
## Naive 0.102 0.078 1.257 0.108
## SES 0.101 0.078 1.248 0.107
## Moving Average 0.007 0.006 0.099 0.008
## Holt Winter 0.084 0.069 1.187 0.095
## ARIMA 0.058 0.047 0.793 0.065
Best and Worst Models by Metric
| Accuracy Measure | Best Model | Worst Model | Interpretation |
|---|---|---|---|
| MAPE | Moving Average | Naïve | MA performs best on a scaled basis, far outperforming all others. |
auto_fit2 <- auto.arima(ts_unemployed)
attributes(auto_fit2)
## $names
## [1] "coef" "sigma2" "var.coef" "mask" "loglik" "aic"
## [7] "arma" "residuals" "call" "series" "code" "n.cond"
## [13] "nobs" "model" "bic" "aicc" "x" "fitted"
##
## $class
## [1] "forecast_ARIMA" "ARIMA" "Arima"
forecast_arima2 = forecast(auto_fit2,h = 12,level=c(99.5))
plot(forecast_arima2, main = "Unemployment Next Year from ARIMA without windowing")
accuracy(forecast_arima2)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.002646581 0.06835575 0.0539397 -0.02653067 0.7628524 0.05421286
## ACF1
## Training set 0.01269474
forecast_arima2
## Point Forecast Lo 99.5 Hi 99.5
## Jan 2020 4.096669 3.902574 4.290764
## Feb 2020 4.229977 3.867452 4.592501
## Mar 2020 4.332123 3.738809 4.925437
## Apr 2020 4.433075 3.586449 5.279701
## May 2020 4.501659 3.383952 5.619367
## Jun 2020 4.578219 3.189472 5.966965
## Jul 2020 4.627073 2.966072 6.288073
## Aug 2020 4.653120 2.724142 6.582098
## Sep 2020 4.668446 2.471881 6.865011
## Oct 2020 4.675410 2.219469 7.131351
## Nov 2020 4.679289 1.969602 7.388976
## Dec 2020 4.690124 1.734497 7.645752
forecast_arima_N2_2 = forecast(auto_fit2,h = 24,level=c(99.5))
forecast_arima_N2_2
## Point Forecast Lo 99.5 Hi 99.5
## Jan 2020 4.096669 3.90257376 4.290764
## Feb 2020 4.229977 3.86745200 4.592501
## Mar 2020 4.332123 3.73880934 4.925437
## Apr 2020 4.433075 3.58644870 5.279701
## May 2020 4.501659 3.38395227 5.619367
## Jun 2020 4.578219 3.18947225 5.966965
## Jul 2020 4.627073 2.96607248 6.288073
## Aug 2020 4.653120 2.72414178 6.582098
## Sep 2020 4.668446 2.47188128 6.865011
## Oct 2020 4.675410 2.21946896 7.131351
## Nov 2020 4.679289 1.96960187 7.388976
## Dec 2020 4.690124 1.73449668 7.645752
## Jan 2021 4.697726 1.51239187 7.883060
## Feb 2021 4.728808 1.32769641 8.129920
## Mar 2021 4.759328 1.15626983 8.362385
## Apr 2021 4.781986 0.98931218 8.574660
## May 2021 4.800928 0.82682504 8.775031
## Jun 2021 4.821846 0.67576556 8.967926
## Jul 2021 4.828969 0.51816919 9.139770
## Aug 2021 4.827170 0.35867067 9.295669
## Sep 2021 4.827129 0.20537855 9.448879
## Oct 2021 4.827348 0.05826121 9.596435
## Nov 2021 4.815239 -0.09657328 9.727051
## Dec 2021 4.799516 -0.25034390 9.849376
plot(forecast_arima_N2_2, main = "Unemployment Next Two Year from ARIMA without windowing")
ARIMA Forecasting Using the Full Historical Series — Summary Analysis
Model used -ARIMA(3,1,0)(2,0,0)[12] with drift
MAPE ~ 0.76% → extremely accurate (sub-1% error is unusually strong for macro data)
Forecast Interpretation (Next 1–2 Years) > Next Year: 4.09->4.69 a slight increase > Next Two Year: 4.09 - 4.8 Signaling a continued but moderate upward trend.
Overall: The full-sample ARIMA model suggests unemployment has reached a cyclical bottom and is now expected to rise gradually over the next two years. However, although the full-sample ARIMA model is statistically accurate, it mixes multiple historical economic regimes and overweights old high-unemployment periods. This leads to biased and less realistic forecasts for today’s labor market. A recent data window better reflects current structural conditions and produces more relevant short-term predictions.
Overall -In the most recent years, the unemployment rate becomes much more stable, fluctuating within a narrow band around 3.8–4.2% with very consistent seasonality. There is no strong upward or downward trend, and variability is lower than earlier decades, making the series highly predictable in the short term.
Based on the selected models and forecasts: - Next Year: Both Holt–Winters and ARIMA predict unemployment will remain stable around 3.8–4.2%, with only minor month-to-month fluctuations and no major upward or downward shifts. - Next Two Years: The models collectively indicate continued stability, with values likely holding in the 3.7–4.3% range, suggesting no significant structural changes in the unemployment trend.
Rank Model 1. Moving Average- Best accuracy (lowest MAPE). 2. ARIMA – Highest accuracy, balanced fit, reliable residuals 3. Holt–Winters Additive – Strong seasonal modeling, good accuracy 4. Simple Exponential Smoothing (SES) – Reasonable but lacks seasonal structure 5. Naïve Model – Weakest; fails to capture trend or seasonality